348
these three predictions, for example, if beta strand and helix but no loop region are pre
dicted simultaneously by the three lower-level networks.
Further tricks additionally improve the predictions of this software. In particular, many
sequences with similar structure are automatically added to the question sequence (mul
tiple alignment). Thus, this secondary structure prediction allows an accuracy of up to
80%. This is already very close to the theoretical optimum. The only way to become even
more accurate is to predict the three-dimensional structure at the same time.
Question 14.5
One software is MemBrain (https://www.membrain-nn.de/index.htm; https://www.mem
brain-nn.de/).
Question 14.6
Please search the internet for deep learning and inform yourself. Helpful is also the page:
https://deeplearning.net/. For AlphaGo also on the Internet (https://deepmind.com/
research/alphago; https://www.youtube.com/watch?v=mzpW10DPHeQ).
Question 14.7
Classification models are used in bioinformatics for the classification between two catego
ries (binary), for example for the diagnosis of a disease (sick/healthy). It is important here
to become familiar with a classification table (confusion matrix; TP, FP, FN, TN), but also
to look at the performance metrics (sensitivity, false positive rate, specificity, PPV, NPV,
accuracy, misclassification rate, prevalence, ROC, AUC) for evaluating a classification
model. Here it is also important to know what are, for example, differences between sen
sitivity and PPV, but also between specificity and NPV. For example, let’s imagine: A
person gets a positive (negative) test result from a predictive test that has a sensitivity of
90%, specificity of 99%, a PPV of 80%, and a NPV of 99%. Here, the positive test result
could only be trusted 80% to actually be positive (sick) (20% false positive, so fortunately
healthy), whereas a negative test result could be trusted more to actually be healthy (1%
false negative, so actually sick). Most diagnostic testing procedures take this into account
and, in the case of a positive test result, carry out a second test to confirm the diagnosis
(e.g. mammography screening). On the other hand, a test should in any case be accurate
enough to identify a healthy person with a high probability (here it would be worse to send
home a supposedly healthy person [negative test result] who is in fact sick [false negative]
and thus does not get any helping therapy or infects other people with a virus [e.g.
COVID-19]). In addition, one should think about problems (little data, etc.) in creating a
classification model, but also what requirements a classification model should meet. To
build a predictive model, it is advisable to use a training and test dataset (splitting 80/20%)
and validate the model on at least one independent dataset to better evaluate the predic
tive power.
20 Solutions to the Exercises